Voting Massive Collections of Bayesian Network Classifiers for Data Streams
نویسنده
چکیده
We present a new method for voting exponential (in the number of attributes) size sets of Bayesian classifiers in polynomial time with polynomial memory requirements. Training is linear in the number of instances in the dataset and can be performed incrementally. This allows the collection to learn from massive data streams. The method allows for flexibility in balancing computational complexity, memory requirements and classification performance. Unlike many other incremental Bayesian methods, all statistics kept in memory are directly used in classification. Experimental results show that the classifiers perform well on both small and very large data sets, and that classification performance can be weighed against computational and memory costs.
منابع مشابه
A fast incremental extreme learning machine algorithm for data streams classification
Data streams classification is an important approach to get useful knowledge from massive and dynamic data. Because of concept drift, traditional data mining techniques cannot be directly applied in data streams environment. Extreme learning machine (ELM) is a single hidden layer feedforward neural network (SLFN), comparing with the traditional neural network (e.g. BP network), ELM has a faster...
متن کاملWhat Can We Do with Graph-Structured Data? - A Data Mining Perspective
Feedback in multimodal self-organizing networks enhances perception of corrupted stimuli p. 19 Identification of fuzzy relation model using HFC-based parallel genetic algorithms and information data granulation p. 29 Evaluation of incremental knowledge acquisition with simulated experts p. 39 Finite domain bounds consistency revisited p. 49 Speeding up weighted constraint satisfaction using red...
متن کاملMining multi-dimensional concept-drifting data streams using Bayesian network classifiers
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes mu...
متن کاملVoting Principle Based on Nearest kernel classifier and Naive Bayesian classifier
This paper presented a voting principle based on multiple classifiers. This voting principle was based on the naïve Bayesian classification algorithm and a new method based on nearest to class kernel classifier that was proposed. The recognition ability of each classifier to each sample is not the same. A model of each classifier was obtained by the training on the train data, which acts as bas...
متن کاملBagged Voting Ensembles
Bayesian and decision tree classifiers are among the most popular classifiers used in the data mining community and recently numerous researchers have examined their sufficiency in ensembles. Although, many methods of ensemble creation have been proposed, there is as yet no clear picture of which method is best. In this work, we propose Bagged Voting using different subsets of the same training...
متن کامل